Rank | Count | Beginning |
---|---|---|
206353 | 37972 | The |
76718 | 7986 | He |
104665 | 7120 | In |
93126 | 6788 | I |
116276 | 6389 | It |
84 | 5690 | A |
36593 | 5553 | But |
257796 | 4998 | This |
13137 | 3881 | And |
275749 | 3650 | We |
252921 | 3355 | They |
186292 | 3151 | She |
96416 | 2827 | If |
93128 | 2769 | “I |
22052 | 2727 | As |
237152 | 2356 | There |
122522 | 2180 | It’s |
275747 | 2180 | “We |
66872 | 2167 | For |
202783 | 2032 | That |
206488 | 2012 | “The |
286632 | 1960 | When |
162177 | 1925 | On |
296472 | 1856 | You |
192319 | 1731 | So |
4973 | 1637 | After |
26480 | 1493 | At |
90611 | 1476 | However, |
289589 | 1455 | While |
1247 | 1450 | According |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV